Building a resource for studying translation shifts 



Lea Cyrus 



O 
O 

(N 

C 

(N 
(N 

u 

O 



> 

o 

\o 
o 

o 

o 



X 
S3 



Arbeitsbereich Linguistik, University of Miinster 
HiifferstraBe 27, 48149 Miinster, Germany 
lea@uni-muenster.de 

Abstract 

This paper describes an interdisciplinary approach which brings together the fields of corpus linguistics and translation studies. It presents 
ongoing work on the creation of a corpus resource in which translation shifts are explicitly annotated. Translation shifts denote departures 
from formal correspondence between source and target text, i. e. deviations that have occurred during the translation process. A resource 
in which such shifts are annotated in a systematic way will make it possible to study those phenomena that need to be addressed if machine 
translation output is to resemble human translation. The resource described in this paper contains English source texts (parliamentary 
proceedings) and their German translations. The shift annotation is based on predicate-argument structures and proceeds in two steps: 
first, predicates and their arguments are annotated monolingually in a straightforward manner. Then, the corresponding English and 
German predicates and arguments are aligned with each other. Whenever a shift - mainly grammatical or semantic - has occurred, the 
alignment is tagged accordingly. 



1. Introduction 

Recent years have shown a growing interest in bi- or multi- 
lingual linguistic resources. In particular, parallel corpora 
(or translation corpora) have become increasingly popu- 
lar as a resource for various machine translation applica- 
tions. So far, the linguistic annotation of these resources has 
mostly been limited to sentence or word alignment, which 
can be done largely automatically. However, this type of 
alignment reveals only a small part of the relationship that 
actually exists between a source text and its translation. In 
fact, in most cases, straightforward correspondences are the 
exception rather than the rule, because translations deviate 
in many ways from their originals: they contain numerous 
shifts. 

The notion of shift is an important concept in transla- 
tion studies (see Section 0. However, shifts have not yet 
been dealt with extensively and systematically in corpus 
linguistics. This paper presents an ongoing effort to build 
a resource (FuSe) in which shifts (in translations from 
English to German) are annotated explicitly on the basis 
of predicate-argument structures, thus making translation 
equivalence visible. 

When finished, the resource will open up a possibility for 
linguists and translation theorists to investigate the corre- 
spondences and shifts empirically, but also for researchers 
in the field of machine translation, who can use this re- 
source to detect the problems they still have to address if 
they want to make their output resemble human translation. 
The FuSe annotation project is described in more detail in 
Section |3 and Section |4~1 gives an overview of the way it 
relates to other work. 

2. Translation Shifts 

The investigation of shifts has a long-standing tradition 
in translation studies. |Vinay and Darbelnet (1958) , work- 
ing in the field of comparative stylistics, developed a sys- 
tem of translation procedures. Some of them are more or 
less direct or literal, but some of them are oblique and re- 
sult in various differences between the source and the tar- 
get text. These procedures are called transposition (change 



in word class), modulation (change in semantics), equiva- 
lence (completely different translation, e. g. proverbs), and 
adaptation (change of situation due to cultural differences). 
There is a slight prescriptive undertone in the work of Vinay 
and Darbelnet, because they state that oblique procedures 
should only be used if a more direct one would lead to a 
wrong or awkward translation. Nevertheless, their approach 
to translation shifts, even though avant la lettre, continues 
to be highly influential. 

The actual term shift was introduced by Catford (1965 1, 
who distinguishes/orma/ correspondence, which exists be- 
tween source and target categories that occupy approx- 
imately the same place in their respective systems, and 
translational equivalence, which holds between two por- 
tions of texts that are actually translations of each other. A 
shift has occurred if there are "departures from formal cor- 
respondence" (p. 73) between source and target text, i. e. 
if translational equivalents are not formal correspondents. 
According to Catford, there are two major types of shifts: 
level shifts and category shifts. Level shifts are shifts be- 
tween grammar and lexis, e. g. the translation of verbal as- 
pect by means of an adverb or vice versa. Category shifts 
are further subdivided into structure shifts (e. g. a change in 
clause structure), class shifts (e. g. a change in word class), 
unit shifts (e. g. translating a phrase with a clause), and 
intra-system shifts (e. g. a change in number even though 
the languages have the same number system). One of the 
problems with Catford's approach is that it relies heavily 
on the structuralist notion of system and thus presupposes 
that it is feasible - or indeed possible - to determine and 
compare the valeurs of any two given linguistic items. His 
account remains theoretic and, at least to my knowledge, 
has never been applied to any actual translations, not even 
by himself. 

The comparative model by Leu ven-Z wart ( 1 989 1 has been 
devised as a practical method for studying syntactic, se- 
mantic, stylistic, and pragmatic shifts within sentences, 
clauses, and phrases of literary texts and their transla- 
tions. 1 It consists of four steps. Firstly, the units to be com- 

1 There is also a descriptive model, in which the results from 



pared must be established. Van Leuven-Zwart calls them 
transemes, and they consist of predicates and their argu- 
ments or of predicateless adverbials. Secondly, the com- 
mon denominator of the source and the target text transeme 
- van Leuven-Zwart calls this the architranseme - must 
be determined. In a third step, the relationship between 
each transeme and the architranseme - either synonymic 
or hyponymic - is established. Finally, the two transemes 
are compared with each other. If both are synonymic with 
the architranseme, no shift has occurred. Otherwise, there 
are three major categories of shifts: modulation (if one 
transeme is a synonym and the other a hyponym), modifi- 
cation (if both transemes are hyponymic with respect to the 
architranseme), and mutation (if there is no relationship be- 
tween the transemes). There are a number of subcategories 
for each type of shift: the whole list comprises 37 items, 
which is why the model has sometimes been criticized for 
being too complex to be applied consistently. 

3. The FuSe Annotation Project 

3.1. The Data 

The data annotated in FuSe are taken from the Europarl 
corpus (Koehn, 2002 1 2 , which contains proceedings of the 
European parliament. In a resource designed for studying 
translation shifts, it is not enough that the data be parallel. 
It is of vital importance that they are actually translations 
of each other. 3 Since many translation shifts are directional 
(e. g. passivisation), the direction of the translation must 
also be clear (in this case from English into German). We 
used the language attribute provided by the corpus to ex- 
tract those sentences that were originally English. In the 
corpus, the language attribute is only used if the language 
of the corpus file does not correspond with the original lan- 
guage. Thus, we extracted those sentences from the English 
subcorpus that had no language attribute and were aligned 
to sentences with the language attribute "EN" in the Ger- 
man subcorpus. Furthermore, in order to ensure that the 
English source sentences were produced by native speak- 
ers, we matched the value of the name attribute against the 
list of British and Irish Members of Parliament, which is 
available on the Europarl website. 4 

3.2. Predicates and Arguments as Transemes 

The first step in annotating translation shifts is determin- 
ing the transemes, i. e. those translation units on which the 
comparison between source and target text will be based. It 
was mentioned in Section |2 that the transemes originally 
used by Leuven-Zwart (1989) consist of predicates and 
their arguments (and adverbials). The disadvantage with 
this division is that the transemes are quite complex (whole 
clauses), which means that there could occur several shifts 

the comparative model are used to gain insight into shifts on the 
story level and into the norms governing the translation process 
jLeuven-Zwart, 1990| . This model is not further discussed, be- 
cause it is not related to the approach presented in this paper. 

2 We use the XCES version by |Tiedemann and Nygaard" (2004 1. 

3 The Europarl corpus is available in eleven languages, so large 
parts of the English and German subcorpora will be translated 
from a third language. 

Ihttp : / /www . europarl . eu . int/| 



within one transeme. While this seems to have been un- 
problematic for van Leuven-Zwart, who worked with pen 
and paper, the units must be smaller in a computational an- 
notation project in order for the shifts to be assigned unam- 
biguously. 

The approach presented in this paper is also based on 
predicate-argument structures, because it is assumed that 
these capture the major share of the meaning of a sentence 
and are most likely to be represented in both source and 
target sentence. However, unlike in van Leuven-Zwart' s ap- 
proach, each predicate (lexical verbs, certain nouns and cer- 
tain adjectives) and each argument represents a transeme 
in itself, i. e. there are predicate transemes and argument 
transemes. Of course, even this more fine-grained annota- 
tion entails that correspondences and shifts on other levels 
will be missed, but in order to ensure workability and re- 
producibility of the annotation, this restriction seems justi- 
fiable. 

The predicate-argument structures are annotated mono- 
lingually, and since the annotation is mostly a means 
to an end, it is kept deliberately simple: predicates are 
represented by the capitalised citation form of the lex- 
ical item (e. g. DRAMATISE). They are assigned a class 
based on their syntactic form (v, n, a, c, I for 'verbal', 
'nominal', 'adjectival', 'copula', and 'light verb construc- 
tion' respectively). Homonymous predicates are disam- 
biguated for word senses, and related predicates (e. g. a verb 
and its nominalisation) are assigned to a common pred- 
icate group. In order to facilitate the annotation process, 
the arguments are given short intuitive role names (e. g. 
ENT_DRAMATISED, i. e. the entity being dramatised). These 
role names have to be used consistently only within a pred- 
icate group. If, for example, an argument of the predicate 
DRAMATISE has been assigned the role ENT_DRAMATISED 
and the annotator encounters a comparable role as an ar- 
gument to the predicate DRAMATISATION, the same role 
name for this argument has to be used. Other than that, no 
attempt at generalisation along the lines of semantic cases 
is made. 

If a predicate is realised in a way that might influence the re- 
alisation of its argument structure in a systematic way (e. g. 
infinitive, passive), it receives a tag to indicate this. If one 
of the arguments is a relative pronoun, its antecedent is also 
annotated. This is done in order to avoid the annotation of a 
pronominalisation shift (see Section l3.3.1.> in these cases, 
since the antecedent of relative pronouns is so close that 
it would be wrong to call this a pronominalisation. Apart 
from this, there is no anaphor resolution. 

3.3. Shift Annotation 

After the predicate-argument structures have been anno- 
tated monolingually, the source predicates and arguments 
are aligned to their target counterparts. Sometimes, this is 
possible in a straightforward manner, like in sentence pair 




5 Predicate transemes are in bold face, argument transemes are 
in square brackets. For the sake of clarity, the predicate and argu- 
ment names are omitted. 



(1) a. [I] refer [to item 1 1 on the order of business]. 6 

b. [Ich] beziehe mich [auf Punkt 1 1 des 
I refer me on point 1 1 of. the 
Arbeitsplans]. 
workplan. 

However, more often than not the relationship will not be 
this simple. Whenever a shift occurs, the alignment be- 
tween the two predicates or arguments is tagged. Mainly, 
the shifts are categorised according to whether they oc- 
cur on the level of grammar or on the level of semantics. 
The following is an introduction to the main types of shifts. 
They are first described in Sections 13.3.1. 1 to [3.3. 3. 1 and to 
make this more concrete, a few examples are given in Sec- 
tion l3. 3.4.1 

3.3.1. Grammatical Shifts 

Category Change This tag is assigned whenever the cor- 
responding transemes belong to different syntactic cate- 
gories, and it can be applied both to predicates and ar- 
guments. A typical example would be a verbal predicate 
transeme that is translated as a nominal predicate (nomi- 
nalisation). 

Passivisation This tag can only be assigned to the align- 
ment between verbal predicates (and certain light verb con- 
structions) and is used if an active predicate has been ren- 
dered as a passive predicate. Often, but not always, a pas- 
sivisation shift goes hand in hand with a deletion shift (see 
below), namely if the source subject is no longer explicitly 
expressed in the passivised translation. 

Depassivisation Conversely, if a passive verbal predicate 
has been rendered as an active verbal predicate, this is 
tagged depassivisation. If the source predicate-argument 
structure lacks the agentive argument, there will also be an 
addition shift (see below). 

Pronominalisation This tag can only be assigned to the 
alignment between arguments. It is used if the source argu- 
ment is realised by lexical material (or a proper name) but 
translated as a pronoun. This tag is not used if the pronoun 
in question is a relative pronoun, because the antecedent 
can always be found in close vicinity and is annotated as 
part of the transeme (see Section l3~2l . 

Depronominalisation This tag can only be assigned to 
the alignment between arguments. It is used if a source 
argument transeme is realised as a pronoun but translated 
with lexical material or a proper name. 

Number Change This tag is assigned if the corre- 
sponding transemes differ in number, i. e. one is singular, 
the other plural. This happens mainly between argument 
transemes, but can also occur between nominal predicates. 

3.3.2. Semantic Shifts 

Semantic Modification This tag is assigned if the two 
transemes are not straightforward equivalents of each other 
because of some type of semantic divergence, for example 
a difference in aktionsart between two verbal predicates. 



Opus/Europarl (en): file ep-00-01-18.xml, sentence id 4.2 



It is rather difficult to find objective criteria for this shift. In 
the majority of cases two corresponding transemes exhibit 
some kind of divergence if taken out of their context, but 
are more or less inconspicuous translations in the concrete 
sentence pair. Since an inflationary use of this tag would de- 
crease its expressiveness, semantic likeness is interpreted 
somewhat liberally and the tag is assigned only if the se- 
mantic difference is significant. Of course, this is far from 
being a proper operationalisation, and we hope to clarify 
the concept as we go along. 

Explicitation This is a subcategory of semantic modifi- 
cation, which is assigned if the target transeme is lexically 
more specific than the source transeme. A clear case of 
explicitation is when extra information has been added to 
the transeme. One could also speak of explicitation when a 
transeme has been depronominalised (see Section l3.3.1.> . 
However, since the depronominalisation shift is already 
used in these cases, this would be redundant and is therefore 
not annotated. 

Generalisation This is the counterpart to the explicita- 
tion shift and is used when the target transeme is lexically 
less specific than its source, and in particular if some infor- 
mation has been left out in the translation. To avoid redun- 
dancy, it is not used for pronominalisation shifts. 

Addition This tag is assigned to a target transeme, either 
predicate or argument, that has been added in the translation 
process. For instance, if there has been a depassivisation 
shift and if the agentive argument had not been realised in 
the source text, it must be added in the target text. Note that 
we do not speak of addition if only part of the transeme 
has been added. In this case, the explicitation tag is to be 
assigned (see above). 

Deletion This tag is assigned to a source transeme that is 
untranslated in the target version of the text. Analogous to 
the addition shift, this tag is only used if the entire transeme 
has been deleted. If it is only part of a transeme that is un- 
translated, the shift is classified as generalisation. 

Mutation This tag is used if it is possible to tell that two 
transemes are translation equivalents (in the sense intended 
by Catford, see Section^, but if they differ radically in 
their lexical meaning. This shift often involves a number of 
other shifts as well. 

3.3.3. Problematic Issues 

Long Transemes Normally, a maximum of two shifts can 
be assigned to any one pair of transemes: a grammatical and 
a semantic shift. This can be a problem if the transemes 
are long, like for instance clausal arguments. Because of 
their length, they can contain multiple shifts, and it is dif- 
ficult to determine which of them is to be the basis for 
the shift annotation, in particular if they are contradictory 
(e. g. there might occur both generalisation and explicita- 
tion in different parts of the transeme). The general rule 
here is to check whether the shift actually affects the over- 
all transeme. In most cases, long transemes contain further 
transemes, e. g. clausal arguments contain at least one ex- 
tra predicate plus arguments, which will be represented by 
their own predicate-argument structure, and it is on this 
level that these shifts are recorded. 



Lexical Modals Modal auxiliaries are currently not an- 
notated as separate predicates. This is no problem as long 
as the modality is expressed by means of a modal auxil- 
iary in both languages. However, sometimes modality is ex- 
pressed by a full verb with modal meaning (e. g. to wish), 
which is consequently annotated as a predicate. If the other 
language uses a modal auxiliary, no alignment is possible, 
because there is no predicate transeme. Normally, when a 
predicate transeme has no correspondent in the other lan- 
guage, one would assign the addition or deletion shift, but 
since nothing really has been added or deleted, this is not 
a particularly satisfying solution. One way out would be 
to rethink our attitude towards modals and simply annotate 
them as predicates. While the decision is still pending, such 
predicates are tagged dangling modal. 

Structure Shifts It also happens that a transeme cannot 
be aligned because it is not realised as part of a predicate- 
argument structure in the other language. An example of 
this would be a full verb with modal meaning that is ren- 
dered as an adverb in the other language (e. g. to wish - 
gem, 'with pleasure'). Again, it would not be adequate to 
speak of addition or deletion. However, since these cases 
constitute real structural shifts, the additional tag non-pas 
(i.e. 'non-predicate-argument-structure') has been intro- 
duced to deal with them. 

3.3.4. Examples 

In this section, the shift annotation described in the previous 
sections is illustrated by a few examples from the corpus. 

(2) a. [It] should not be dramatised [into something 

more than that]. 7 
b. [Wir] sollten [die ganze Sache] nicht weiter 
We should the whole thing not further 
aufbauschen. 
exaggerate. 

Both sentences contain one predicate transeme 
(DRAMATISE and AUFBAUSCHEN) and two argument 
transemes. The two predicates differ with respect to voice: 
while the source predicate in |(2-a)| is passive, its German 
counterpart |(2-b)| is active, so the alignment between these 
two predicates would receive the depassivisation tag. As a 
consequence of the change of voice, the agentive argument, 
which is left unexpressed in the passive source sentence, 
is explicitly expressed in the German translation {Wir, 
'we'), and is consequently tagged addition. Conversely, 
the argument into more than that is left unexpressed 
in the German version - this is an instance of deletion. 
Furthermore, the subject of the English sentence (if), the 
entity that is being dramatised, is expressed lexically in the 
translation. The alignment between these two arguments is 
thus tagged as depronominalisation. 

(3) a. [. . . ] [we] agreed yesterday to have [the 

Bourlanges report] [on today's agenda]. 8 
b. [Wir] kamen gestern iiberein, [den Bericht 
We came yesterday agreed, the report 

7 Opus/Europarl (en): file ep-00-01-18.xml, sentence id 8.4 
8 Opus/Europarl (en): file ep-00-01-18.xml, sentence id 11.1 
(abbreviated for convenience). 



Bourlanges] [auf die Tagesordnung von 
Bourlanges on the agenda of 
heute] zu setzen. 
today to put. 

In this sentence pair, the alignment between the two predi- 
cate transemes HAVE and SETZEN is tagged semantic mod- 
ification because they differ in aktionsart: the English pred- 
icate is static, while the German predicate is telic. 

(4) a. [I] do not want to drag up [the issue of this 
building] endlessly [. . . ] 9 
b. [Ich] will nicht endlos [auf diesem 
I want not endlessly on this 
Thema] herumreiten [. . . ] 
topic keep. on. about [. . . ] 

Example |(4)| illustrates the use of the generalisation shift. 
The second argument transeme in the original |(4-a)| con- 
tains explicit information on what the issue is about. This 
information is left out in the translation |(4-b)| with the re- 
sult that the transeme is more general. Since it is only a part 
of the transeme that has been dropped in the translation, this 
is not annotated as deletion. 

3.4. Tools 

The (monolingual) predicate-argument structures are anno- 
tated with FuSer ( |Pyka and Schwall, 200 6). The annotator 
is presented with a sentence and, if available, 10 a graphical 
view of its syntactic structure, and selects those tokens (or 
nodes from the tree) which are to be annotated as a pred- 
icate. The annotator can choose from a list of predicates, 
or, if the predicate type is encountered for the first time, 
add a new predicate type or group to the database. Once 
the predicate is annotated, the procedure is repeated for 
the arguments of this predicate. Again, either the argument 
types are chosen from the list or added to the database. Ad- 
ditionally, the necessary tags (see Section 13. 2. \ are added 
to the predicates and arguments. The annotation process is 
then repeated for all the predicate-argument structures in a 
sentence. They are annotated independently, i.e. there is no 
nesting of predicates. 

Currently, the predicate-argument structures are annotated 
manually, which is a time-consuming task. However, there 
are a couple of "wizards" under development which will 
assist the annotator. For instance, there will be a wizard 
to scan the sentence for predicate candidates or to suggest 
suitable argument types when the predicate is already in- 
cluded in the database. 

Technically, FuSer is a platform-independent Java applica- 
tion which operates on an extended ANNOTATE MySQL 
database. This data model makes it possible to be flexible 
with respect to the input data, which can be either raw (as 



9 Opus/Europarl (en): file ep-00-01-18.xml, sentence id 13.3 
(abbreviated for convenience) 

10 The original outline of FuSe also included phrase structure 
jCyrus et al., 2004]|Cyrus and Fe ddes, 2004 1, but this was shelved 
for practical reasons. However, syntactic annotation is something 
from which FuSe would definitely profit, and the tools can be used 
both on raw data and on trees. 



is currently the case) or syntactically annotated. Further- 
more, since the ANNOTATE database is only extended and 
not modified, data processed with FuSer can always be pro- 
cessed by Annotate afterwards (e. g. for further annota- 
tion). 

It is planned to extend FuSer for the bilingual alignment and 
the shift annotation. While this extension is under develop- 
ment, we use a simple Web-based alignment tool (XML, 
Perl, CGI) for this task (see Figure QJ. The browser win- 
dow is divided into three parts: in the upper third, the an- 
notator can select a sentence pair. In the middle part, all 
the predicate-argument structures that have been annotated 
for these sentences are listed, with the predicates and argu- 
ments being highlighted in different colours. The annota- 
tor chooses (by means of radio buttons) two corresponding 
predicate-argument structures, which are then displayed in 
more detail in the lower window. Here, the annotator can 
align corresponding predicates and arguments with each 
other and, if necessary, choose up to two shift-tags for each 
pair of transemes from a drop-down menu. The lower win- 
dow can also be used for viewing existing annotation. 

4. Related Work 

Being interdisciplinary, this work is related both to ap- 
proaches from translation studies and to various annotation 
projects. Since the translational approaches have already 
been presented in Section |3 this section will confine it- 
self to related annotation projects and the way they compare 
with FuSe. 

First of all, there are those projects that also deal 
with predicate-argument structures in some way, in par- 
ticular FrameNet ( Ruppen hofer et al., 2005| > (which is 
mainly a lexicographical project but can, of course, be 
adopted for extensive corpus annotation, as is currently 
done in the SALSA project l |Erk et al., 2003> ), PropBank 
( |Palmer et al., 2005> , and NomBank ( fvleyers et al., 2004) . 
In these projects, the predicate-argument annotation is the 
main objective, so they all try some kind of generali- 
sation by organising their predicates in semantic frames 
(FrameNet) or by following the Levin classes (PropBank, 
and for nominalisations also NomBank). In FuSe, however, 
this type of annotation is not an end in itself - predicates 
and their arguments simply constitute the transemes. Con- 
sequently, their annotation is kept deliberately simple and 
is entirely predicate-group specific without any attempt at 
generalisation. 

What distinguishes FuSe from all of the above mentioned 
projects is that it deals not with one language, but with 
two (and potentially more) languages, and in particular with 
parallel data. It thus makes sense to also compare it with ap- 
proaches that model the relationship between original texts 
and their translations. 

In the IAMTC project ( Farwe ll et al., 200 4 ), texts from six 
languages (Arabic, French, Hindi, Japanese, Korean, and 
Spanish) and their translations into English are annotated 
for interlingual content. For each original text, at least two 
English translations are being annotated (so as to be able to 
study paraphrases), and the annotation proceeds incremen- 
tally over three increasingly abstract levels of representa- 
tion. 



The difference here, apart from the languages involved, lies 
first and foremost in the type of the semantic representation. 
The semantic representation aimed at in IAMTC will be a 
full-fledged interlingua and thus far more complex than the 
predicate-argument structure in FuSe. The ultimate aim is 
to create a full semantic representation of each sentence that 
is not only independent of the actual syntactic realisation, 
but also of the language. Thus, provided there aren't any 
shifts, the IAMTC representations of the source and target 
language material could be identical. 
In FuSe, however, considerable parts of the sentence mean- 
ing are not captured by the predicate-argument annotation. 
Furthermore, the annotation is entirely language-specific. 
There is nothing in the database that indicates that a pred- 
icate BUY and a predicate KAUFEN can be used to express 
the same meaning in their respective languages, except for 
the fact that they are being aligned with each other. The 
predicate-argument structure is the basis of the alignment, 
but it is not an interlingua. 

Furthermore, in IAMTC, there seems to be no direct align- 
ment between the different versions of the texts. Differ- 
ences in semantics result in differences in the interlingual 
representation, but particularly shifts on the level of gram- 
mar, e.g. passivisation, are normalised even on the most 
basic level (cf. p. 58). 

As part of the Nordic Treebank Network, 11 
|Volk et al. (200 6 1 have begun to build an English-Swedish- 
German treebank in which the relationship between the 
languages is annotated by alignment on a sub-sentential 
level, i. e. between words, phrases, and clauses. In this 
respect, there is a close resemblance with FuSe. One of 
the differences is that their emphasis lies on the syntactic 
annotation of the sentences, which is not the case in FuSe. 
Second, the phrase alignment is done directly, i. e. with- 
out the predicate-argument "detour": nodes that "convey 
the same meaning and could serve as translation units" are 
aligned, and there are two types of alignment, namely exact 
and approximate. 

5. Outlook 

So far, the annotated data consist of English source texts 
that have been translated into German. It would be inter- 
esting to include the opposite direction as well, i. e. Ger- 
man source texts that have been translated into English. 
This would make it possible - by comparing the types of 
shifts and their quantity - to find out which shifts have 
occurred due to the direction of the translation process, 
and which shifts might be due to the translation process 
as such (e. g. explicitation is taken to be such a potential 
"translation universal" in current translation research, see 
Mau ranen and Kujamaki (20 04 1). 

Furthermore, the genre of the Europarl corpus - parliamen- 
tary proceedings - is highly restricted and it would be a use- 
ful extension to include other types of data (e. g. technical 
language, literary prose) in order to compare the occurrence 
of shifts across genres. 
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Ansicht nach auf einem MiBverstandnis in bezug auf unsere 
gegenseitigen Erwartungen beruhte und weder bbse Absicht 
der etnen noch der anderen Institution war . 

[s8.4] Wir sollten die ganze Sache nicht waiter aufbauschen 



Show 



misapprehension as to what was expected rather than any 
bad faith on the part of either of the two institutions . 



Edit | [s8.4] It should not be dramatised into something more than 
that . 

Show I 



[pas8.4.1] Wir [--ufbauschendesl sollten die faVffiiebauschtBs] ganze ^ ^ [pas8.4.l] It |>nt_dramatised] should not be dramatised 
Sache nicht weiter aufbauschen [aufbauschen_v_l] , [dramatise_v_l/pv+inr] into [ent_dramati5ed_into] something more 

than that . 

Show A | Edit | Show B | 



[pas8.4.1] Wir [auftwusthawtes] 
sollten die [aufgebeusciUas; ganze 
Sache nicht welter 

aufbauschen [aufbauschen_v_l] . 



aufbauschen v i 



1 aufgebauschtes 
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j (not aHgned) 
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aufbauschendes 


- 




| (not aflgned) 
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de-passivisatTon 



(no tag) 



dramatise_v_l/pv+lnf 



de-pronominalisation jj 



(no tag) 



ent_dramatlsed 



[pas8.4.1] It [ent_dramatjsed] 

should not be dramatised 

[dramati£e_v_l/pv+inf] into 
[ent_dramatisetj_into] something 
more than that . 



deletion 



(no tag) 



- 



ent_dramatisedJnto 



addition 



(no tag) 



I (not aligned) 



(no tag) 



(no tag) J 
Save alignment | 



(not aligned) 



Figure 1 : Screenshot of the Web-based alignment tool, showing the annotation of Example|(2)| 
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